The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the missForest, VIM, mice or missMDA packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forests, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.

The report contains, all the results, grouped by both: package and dataset.

Basic (median/mode)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.098
## Test set imputation time:  0.03

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.781
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.004

Test set results

## Test set AUC:  0.951
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.004

Test set results

## Test set AUC:  0.575
## Test set BACC:  0.591
## Test set MCC:  0.215

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.004

Test set results

## Test set AUC:  0.93
## Test set BACC:  0.874
## Test set MCC:  0.752

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.029
## Test set imputation time:  0.013

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.087
## Test set imputation time:  0.043

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.01
## Test set imputation time:  0.007

Test set results

## Test set AUC:  0.903
## Test set BACC:  0.667
## Test set MCC:  0.459

Missings overview

Mice

adult

Crossvalidation results

Imputation times

## Train set imputation time:  5.281
## Test set imputation time:  0.707

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.776
## Test set MCC:  0.598

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.094
## Test set imputation time:  0.059

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.903
## Test set MCC:  0.81

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.345
## Test set imputation time:  0.176

Test set results

## Test set AUC:  0.494
## Test set BACC:  0.492
## Test set MCC:  -0.017

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.076
## Test set imputation time:  0.102

Test set results

## Test set AUC:  0.905
## Test set BACC:  0.85
## Test set MCC:  0.697

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.232
## Test set imputation time:  0.11

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.907
## Test set MCC:  0.874

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  169.971
## Test set imputation time:  48.04

Test set results

## Test set AUC:  1
## Test set BACC:  0.969
## Test set MCC:  0.963

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  1540.894
## Test set imputation time:  107.896

Test set results

## Test set AUC:  0.931
## Test set BACC:  0.83
## Test set MCC:  0.69

Missings overview

K-Nearest Neighbors (VIM)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  113.919
## Test set imputation time:  7.513

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.775
## Test set MCC:  0.595

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.308
## Test set imputation time:  0.104

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.499
## Test set imputation time:  0.117

Test set results

## Test set AUC:  0.589
## Test set BACC:  0.607
## Test set MCC:  0.234

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.115
## Test set imputation time:  0.056

Test set results

## Test set AUC:  0.923
## Test set BACC:  0.83
## Test set MCC:  0.667

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  6.764
## Test set imputation time:  0.645

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  305.139
## Test set imputation time:  21.198

Test set results

## Test set AUC:  1
## Test set BACC:  0.978
## Test set MCC:  0.974

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.899
## Test set imputation time:  0.362

Test set results

## Test set AUC:  0.926
## Test set BACC:  0.814
## Test set MCC:  0.648

Missings overview

Hot Deck (VIM)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.085
## Test set imputation time:  0.035

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.779
## Test set MCC:  0.605

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.031
## Test set imputation time:  0.027

Test set results

## Test set AUC:  0.965
## Test set BACC:  0.901
## Test set MCC:  0.811

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.041
## Test set imputation time:  0.026

Test set results

## Test set AUC:  0.594
## Test set BACC:  0.594
## Test set MCC:  0.207

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.033
## Test set imputation time:  0.026

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.868
## Test set MCC:  0.733

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.058
## Test set imputation time:  0.046

Test set results

## Test set AUC:  0.995
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.614
## Test set imputation time:  0.345

Test set results

## Test set AUC:  1
## Test set BACC:  0.964
## Test set MCC:  0.956

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.073
## Test set imputation time:  0.062

Test set results

## Test set AUC:  0.922
## Test set BACC:  0.849
## Test set MCC:  0.708

Missings overview

MissRanger

adult

Crossvalidation results

Imputation times

## Train set imputation time:  28.923
## Test set imputation time:  4.688

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.78
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.601
## Test set imputation time:  0.208

Test set results

## Test set AUC:  0.959
## Test set BACC:  0.883
## Test set MCC:  0.769

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.492
## Test set imputation time:  0.151

Test set results

## Test set AUC:  0.587
## Test set BACC:  0.57
## Test set MCC:  0.149

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.548
## Test set imputation time:  0.209

Test set results

## Test set AUC:  0.919
## Test set BACC:  0.846
## Test set MCC:  0.695

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  1.787
## Test set imputation time:  0.455

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  84.537
## Test set imputation time:  19.272

Test set results

## Test set AUC:  1
## Test set BACC:  0.977
## Test set MCC:  0.972

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  1.24
## Test set imputation time:  0.478

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.872
## Test set MCC:  0.768

Missings overview